Discrimination between Printed and Handwritten Text in Documents

نویسندگان

  • M. S. Shirdhonkar
  • Manesh B. Kokare
چکیده

Recognition techniques for printed and handwritten text in scanned documents are significantly different. In this paper, we propose method to automatically identify the signature in the scanned document images. This helps to retrieve the document images based on the signature. A simple region growing algorithm is used to segment the document into a number of patches. A patch is composed of many closely located components. A component is a one piece of connected foreground pixels (say 8 connectivity). We extracted the state features of all the patches to identify the signature in the document images. A label for each such segmented patch is inferred using neural network model (NN) and support vector machine (SVM). These models are flexible enough to include signature as a type of handwriting and isolate it from machine-print. From experimental results we found that classification rate for SVM is superior over NN. General Terms Pattern Recognition, data mining, document image retrieval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of Printed and Handwritten Text: a Review

Separating handwritten and machine printed text from a document has many applications. Various types of documents like bank cheques and forms etc. are used in daily life which contains both handwritten as well as printed text. It is necessary to separate handwritten and machine printed text before processing it with optical character recognition system. Various strategies are used to discrimina...

متن کامل

Language Identification in Document Images

This paper presents a system dedicated to automatic language identification of text regions in heterogeneous and complex documents. This system is able to process documents with mixed printed and handwritten text and various layouts. To handle such a problem, we propose a system that performs the following sub-tasks: writing type identification (printed/handwritten), script identification and l...

متن کامل

Distinction between Machine Printed Text and Handwritten Text in a Document

In many documents machine printed& handwritten texts are intermixed .Optical Character Recognition (OCR) techniques are different for machine printed and handwritten text, so it is necessary to separate these text before giving input to the OCR. In this paper we are proposing methodology for Hindi language. This methodology is based on structural features of text. Experimental results on a data...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Identification of Arabic/French Handwritten/Printed Words using GMM-Based System

The discrimination between languages is one of the first steps in the problem of automatic documents text recognition. In many documents, such as bank checks and application forms, printed and handwritten texts are mixed. In this paper, an automatic identification system of Arabic and French words in both handwritten and printed script based on Gaussian Mixture Models (GMMs) was presented. A fi...

متن کامل

Distinction between handwritten and machine-printed text based on the bag of visual words model

In a variety of documents, ranging from forms to archive documents and books with annotations, machine printed and handwritten text may coexist in the same document image, raising significant issues within the recognition pipeline. It is, therefore, necessary to separate the two types of text so that it becomes feasible to apply different recognition methodologies to each modality. In this pape...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010